Running Title: Novel Insights into Turkey Hemorrhagic Enteritis Virus Transcriptome
Abraham Quaye1*, Bret Pickett*, Joel S. Griffitts*, Bradford K. Berges*, Brian D. Poole\({^\dagger}\)*
*Department of Microbiology and Molecular Biology, Brigham
Young University
1First-author
\({^\dagger}\) Corresponding Author
Corresponding Author Information
brian_poole@byu.edu
Department of Microbiology and Molecular Biology,
4007 Life Sciences Building (LSB),
Brigham Young University,
Provo, Utah
Background: Hemorrhagic enteritis (HE) is a disease
affecting 6-12-week-old turkeys characterized by immunosuppression
(IS) and bloody diarrhea. This disease is caused by Turkey
Hemorrhagic Enteritis Virus (THEV) of which avirulent strains
(THEV-A) that do not cause HE but retain the immunosuppressive ability
have been isolated. The THEV-A Virginia Avirulent Strain (VAS) is still
used as a live vaccine despite its immunosuppressive properties. Our
objective is to understand the genetic basis by which VAS induces
IS. The transcriptome of THEV was studied to set the stage
for further experimentation with specific viral genes that may mediate
IS.
Methods: After infecting a turkey B-cell line
(MDTC-RP19) with the VAS vaccine strain, samples in triplicates were
collected at 4-, 12-, 24-, and 72-hours post-infection. Total RNA was
subsequently extracted, and poly-A-tailed mRNA sequencing done. After
trimming the raw sequencing reads with the FastQC, reads
were mapped to the THEV genome using Hisat2 and transcripts
assembled with StringTie. An in-house script was used to
consolidate transcripts from all time-points, generating the final
transcriptome. PCR, gel electrophoresis, and Sanger sequencing were used
to validate all identified splice junctions.
Results and Conclusions: A total of
18.1 million reads mapped to THEV genome providing good
coverage/depth, leaving no regions unmapped. All predicted genes in the
genome were represented. In keeping with all adenoviruses, all
transcripts were spliced with either with 5’- or 3’-multi exon UTRs
hitherto unknown. Thirteen novel exons were
identified which were validated by PCR and Sanger sequencing. The
splicing patterns strongly suggest that there are
three main promoters (E1, E3, and major late
promoters) driving expression of most of the genes with
two possible minor promoters driving single genes
(ORF7 and ORF8). This RNA-sequencing experiment is the first study of
THEV gene expression to date. In keeping with other Adenoviruses, almost
all THEV genes are spliced, and several genes are expressed as one
transcription unit under a single promoter. This insight into THEV’s
transcriptome may allow the engineering of the VAS to provide immune
protection with less or no associated IS.
Adenoviruses (AdVs) are non-enveloped icosahedral-shaped DNA viruses, causing infection in virtually all vertebrates. Their double-stranded linear DNA genomes range between 26 and 45kb in size, producing a broad repertoire of transcripts via a highly complex alternative splicing pattern (1, 2). The AdV genome is one of the most optimally economized; both the forward and reverse DNA strands harbor protein-coding genes, making it highly gene-dense. There are 16 genes termed “genus-common” that are homologous in all AdVs; these are thought to be inherited from a common ancestor. All other genes are termed “genus-specific”. “Genus-specific” genes tend to be located at the termini of the genome while “genus-common” genes are usually central (1). This pattern is observed in Adenoviridae, Poxviridae, and Herpesviridae (1, 3, 4). The family Adenoviridae consists of five genera: Mastadenovirus (MAdV), Aviadenovirus, Atadenovirus, Ichtadenovirus, and Siadenovirus (SiAdV) (5, 6). Currently, there are three recognized members of the genus SiAdV: frog adenovirus 1, raptor adenovirus 1, and turkey adenovirus 3 also called turkey hemorrhagic enteritis virus (THEV) (5, 7–10). Members of SiAdV have the smallest genome size (~26 kb) and gene content (~23 genes) of all known AdVs, and many “genus-specific” putative genes of unknown functions have been annotated (see Figure 1)(1, 2, 7).
Virulent strains (THEV-V) and avirulent strains (THEV-A) of THEV are serologically indistinguishable, infecting turkeys, chickens, and pheasants and the THEV-V cause different clinical diseases in these birds (2, 11). In turkeys, the THEV-V cause hemorrhagic enteritis (HE), a debilitating acute disease affecting predominantly 6-12-week-old turkeys characterized by immunosuppression (IS), weight loss, intestinal lesions leading to bloody diarrhea, splenomegaly, and up to 80% mortality (11–13). HE is the most economically significant disease caused by any strain of THEV (11). While the current vaccine strain (a THEV-A isolated from a pheasant, Virginia Avirulent Strain [VAS]) have proven effective at preventing HE in young turkey poults, it still retains the immunosuppressive ability. Thus, vaccinated birds are rendered more susceptible to opportunistic infections and death than unvaccinated cohorts leading to substantial economic losses (11, 14–16). The induced IS also interferes with vaccination schemes for other infections of turkeys (11, 14). To eliminate this immunossupressive side-effect of the vaccine, a thorough investigation of the culprit viral factors (genes) mediating this phenomenon is essential. However, the transcriptome (splicing and gene expression patterns) of THEV has not been characterized, making the investigation of specific viral genes for possible roles in causing IS impractical. A well-characterized transcriptome of THEV is required to enable the next leap forward in THEV research - experimentation with specific viral genes that may mediate IS.
Myriads of studies have elucidated the AdV transcriptome in fine detail (17, 18). However, a large preponderance of studies focus on MAdVs - specifically human AdVs - thus, most of the current knowledge regarding AdV gene expression and replication is based on MAdV studies, which is generalized for all other AdVs (6, 19). MAdV genes are transcribed in a temporal manner; therefore, genes are categorized into five early transcription units (E1A, E1B, E2, E3, and E4), two intermediate (IM) units (pIX and IVa2), and one major late unit (MLTU), which generates five families of late mRNAs (L1-L5). An additional gene (UXP or U exon) is located on the reverse strand. The early genes encode non-structural proteins such as enzymes or host cell modulating proteins, primarily involved in DNA replication or providing the necessary intracellular niche for optimal replication while late genes encode structural proteins. The immediate early gene E1A is expressed first, followed by the the delayed early genes, E1B, E2, E3 and E4. Then the intermediate early genes, IVa2 and pIX are expressed followed by the late genes (6, 17, 18). MAdV makes an extensive use of alternative RNA splicing to produce a very complex array of mRNAs; all but pIX mRNA undergo at least one splicing event. The MLTU produces over 20 distinct splice variants all of which contain three non-coding exons at the 5’-end (collectively known as the tripartite leader, TPL) (17, 18). There is also an alternate 5’ three non-coding exons present in varying amounts on a subset of MLTU mRNAs (known as the x-, y- and z-leaders). Lastly, there is the i-leader exon, which is infrequently included between the second and third TPL exons, and codes for the i-leader protein (20). Thus, the MLTU produces a complex repertoire of mRNA with diverse 5’-UTRs spliced onto different 3’ coding exons which are grouped into five different 3’-end classes (L1-L5). Each transcription unit contains its own promoter driving the expression of all the array of mRNA transcripts produced via alternative splicing of the genes encoded in the unit(6, 17, 18). Almost all AdV mRNAs are generated by the excision of one or more introns and most of these introns are located in the 5’ or 3’ UTRs of pre-mRNA. Thus the viral introns scarcely interrupt the open reading frames (ORFs) (1, 18).
High throughput sequencing methods have facilitated the discovery of many novel transcribed regions and splicing isoforms. It is also a very powerful tool to study alternative splicing under different conditions at an unparalleled depth (18, 21). In this paper, a paired-end deep sequencing experiment was performed to characterize for the first time, the transcriptome of THEV (VAS vaccine strain) during different phases of the infection, yielding the first THEV splicing map. Our paired-end sequencing allowed for reading 149 bp long high quality (mean Phred Score of 36) sequences from each end of cDNA fragments, which were mapped to the genome of THEV. The generated data from our paired-end sequencing experiment should thus be reliable.
Overview of sequencing data and analysis pipeline
outputs
A previous study by Zeinab et al showed that almost all THEV
transcripts were detectable beginning at 4 hours (22). Therefore, infected MDTC-RP19 cells were
harvested at 4-, 12-, 24-, and 72-hours post-infection(h.p.i) to ensure
an amply wide time window to sample all transcripts. Our paired-end RNA
sequencing (RNA-Seq) experiment yielded an average of
107.1 million total reads of 149bp in
length per time-point. Using the HISAT2 (23) alignment program, a total of
18.1 million reads from all time-points mapped to the
virus genome; this provided good coverage/depth, leaving no regions
unmapped. The mapped reads to the virus genome increased substantially
from 432 reads at 4 h.p.i to 16.9
million reads at 72 h.p.i (Table 1, Figure
2). From the mapped reads, we identified an overall total of
2,815 THEV splice junctions from all time-points, with
splice junctions from the later time-points being supported by
significantly more sequence reads than earlier time-points. For example
all the 12 unique junctions at 4 h.p.i had less than 10
reads supporting each one, averaging a mere 3
reads/junction. Conversely, the 2516 unique junctions
at 72 h.p.i averaged 713.1 reads/junction, some
junctions having as high as 322,677 reads in support.
The substantial increases in splice junctions and mapping reads to the
THEV genome over time denotes the progression of the infection and
correlates with our qPCR assay quantifying the total number of viral
genome copies over time (Figure 3). Using
StringTie (23), an assembler
of RNA-Seq alignments into potential transcripts, the mapped reads for
each time-point were assembled into transcripts using the genomic
location of the predicted THEV ORFs as a guide. In the final
consolidated transcriptome of THEV generated from transcripts from all
time points, we counted a total of 51 transcripts all
of which are novel. Although some exons in some transcripts match the
predicted ORFs of THEV exactly, most of the exons are longer, spanning
multiple predicted ORFs (Figure 4). The complete list
of splice junctions mapped to THEV’s genome has been submitted to the
National Center for Biotechnology Information Gene Expression Omnibus
(http://www.ncbi.nlm.nih.gov/geo) under accession
no. XXXXXX.
Early Region 1 (E1) transcripts. This region in
MAdVs is the first transcribed after successful entry of the viral DNA
into the nucleus of the host cell. The host transcription machinery is
lobbied for the transcription of the region. No viral proteins are
available at this stage; hence, host transcription factors play the key
roles of mobilizing the transcription machinery. After their
translation, the E1 proteins in concert with the host transcription
factors activate the other viral promoters (6). We identified four novel transcripts in
this region and also validated the predicted ORF (sialidase;
ORF1).
Early Region 2A (E2A) transcripts.
Early Region 2B (E2B) transcripts.
Early Region 3 (E3) transcripts.
Early Region 4 (E4) transcripts.
Intermediate Region (IM) transcripts.
Major Late Promoter Region (MLP) transcripts.
The Turkey B-cell line (MDTC-RP19, ATCC CRL-8135) was grown as suspension cultures in 1:1 complete Leibovitz’s L-15/McCoy’s 5A medium with 10% fetal bovine serum (FBS), 20% chicken serum (ChS), 5% tryptose phosphate broth (TPB), and 1% antibiotics solution (100 U/mL Penicillin and 100ug/mL Streptomycin), at 41oC in a humidified atmosphere with 5% CO2. Infected cells were maintained in 1:1 serum-reduced Leibovitz’s L15/McCoy’s 5A media (SRLM) with 2.5% FBS, 5% ChS, 1.2% TPB, and 1% antibiotics solution (100 U/mL Penicillin and 100ug/mL Streptomycin). A commercially available HE vaccine was purchased from Hygieia Biological Labs as a source of THEV-A (VAS strain). The stock virus was titrated using an in-house qPCR assay with titer expressed as genome copy number(GCN)/mL, similar to Mahshoub et al(24) with modifications. Cells were infected at a multiplicity of infection (MOI) of 100 GCN/cell and samples in triplicates were harvested at 4-, 12-, 24-, and 72-h.p.i for RNA-Seq. The infection was repeated but samples in triplicates were harvested at 12-, 24-, 36-, 48-, and 72-h.p.i for PCR validation of novel splice sites.
Total RNA was extracted from infected cells using Thermofishers’
RNAqueous™-4PCR Total RNA Isolation Kit (#AM1914) as per
manufacturer’s instructions. An agarose gel electrophoresis was
performed to check RNA integrity. The RNA quantity and purity was
initially assessed using nanodrop, and RNA was used only if
the A260/A280 ratio was 2.0 ± 0.05 and the A260/A230 ratio was >2 and
<2.2. Extracted total RNA samples were sent to
LC Sciences, Houston TX for poly-A-tailed mRNA sequencing
where RNA integrity was checked with
Agilent Technologies 2100 Bioanalyzer High Sensitivity DNA Chip
and poly(A) RNA-Seq library was prepared following
Illumina's TruSeq-stranded-mRNA sample preparation protocol.
Paired-end sequencing was performed on
Illumina's NovaSeq 6000 sequencing system.
Analysis of our sequence reads were analyzed following a well
established protocol described by Pertea et al (23), using SNAKEMAKE 7.24.0 to
drive the pipeline. Briefly, sequencing reads were trimmed with the
FastQC - version 0.11.9 (25)
program to achieve an overall
Mean Sequence Quality (Phred Score) of 36. Trimmed reads
were mapped to the complete sequence of avirulent turkey hemorrhagic
enteritis virus strain Virginia (https://www.ncbi.nlm.nih.gov/nuccore/AY849321.1/) and
Meleagris gallopavo (https://www.ncbi.nlm.nih.gov/genome/?term=Meleagris+gallopavo)
using Hisat2 - version 2.2.1 (23) with default settings without relying on
known splice sites. The generated BAM files from each
infection time-point were filtered for reads mapping to the THEV genome
and fed into StringTie - version 2.2.1 (23) using a gff3 file from NCBI
containing the predicted ORFs of THEV as a guide. A custom script was
used to consolidate all transcripts from all time-points without
redundancy, generating the final transcriptome of THEV.
All splice junctions identified in this work are novel except one predicted splice site each for pTP and DBP, which were corroborated in our work. However, these predicted splice junctions have not been experimentally validated hitherto, and we identified additional novel splice junctions beside the predicted junctions, giving a more complete picture of the transcripts.
The novel splice junctions after consolidating all transcripts with
StringTie which we validated by PCR and Sanger Sequencing
are shown in Table###1. We designed primers that crossed a
range of novel exon–exon boundaries for each specific transcript in a
transcription unit with their respective universal primers
(supplementary, PCR methods). Each forward primer contained a
KpnI restriction site and reverse primers, an XbaI site. After
first-strand cDNA synthesis with SuperScript™ III First-Strand Synthesis
System (ThermoFisher SCIENTIFIC), these primers were used in a targeted
PCR experiment, the PCR products were analysed on Agarose gels, cloned
by traditional restriction enzyme method and Sanger sequenced to
validate these splice junctions at the sequence level.
All the code/scripts written for analysis of the data is available on github (linkXXXXXX)
LC Sciences Eton Bioscience, Inc, San Diego, CA
Figure 1.
Genomic map of THEV avirulent strain. The central
horizontal line represents the double-stranded DNA marked at 5kb
intervals as white line breaks. Blocks represent viral genes. Blocks
above the DNA line are transcribed rightward, those below are
transcribed leftward. pTP, DBP and 33K predicted to be spliced are shown
as having tails. Shaded regions indicate regions containing
“genus-specific” genes (colored red). Genes colored in blue are
“genus-common”. Gene colored in light green is conserved in all but
Atadenoviruses. The UXP (light blue) is an incomplete gene present in
almost all AdVs. Regions comprising the different transcription units
are labelled at the bottom (E1, E2A, E2B, E3, E4, and IM); the unlabeled
regions comprise the MLTU.
| Metric | 4h.p.i | 12h.p.i | 24h.p.i | 72h.p.i | Total |
|---|---|---|---|---|---|
| Total reads | 1.17e+08 | 7.63e+07 | 1.20e+08 | 1.15e+08 | 4.283000e+08 |
| Mapped (Host) |
1.04e+08 (89.06%) |
6.79e+07 (89.0393%) |
1.06e+08 (88.2719%) |
8.38e+07 (72.9802%) |
3.617000e+08 |
| Mapped (THEV) |
4.32e+02 ( 0.0004%) |
6.70e+03 ( 0.0088%) |
1.18e+06 ( 0.9841%) |
1.69e+07 (14.6904%) |
1.808713e+07 |
| Mean Per Base Coverage/Depth |
2.42 | 37.71 | 6,666.96 | 95,041.7 | 1.017488e+05 |
| Total unique splice junctions |
12 | 41 | 246 | 2516 | 2.815000e+03 |
| Junction coverage Total (≥ 1 read) |
36 | 603 | 115066 | 1794275 | 1.909980e+06 |
| Junction coverage Mean reads |
3 | 14.7073170731707 | 467.747967479675 | 713.14586645469 | 1.198601e+03 |
| Junction coverage ≥ 10 reads |
0 | 13 | 132 | 1750 | 1.895000e+03 |
| Junction coverage ≥ 100 reads |
0 | 1 | 53 | 773 | 8.270000e+02 |
| Junction coverage ≥ 1000 reads |
0 | 0 | 18 | 147 | 1.650000e+02 |
Figure 2. Per base coverage of sequence reads mapping to THEV genome by time point. The pileup of mRNA reads mapping to THEV genome at the base-pair level for each indicated time point show remarkable difference in terms of quantities. There is a dramatic increase of mean coverage/depth from 2.42 at 4 h.p.i to 95,042 at 72 h.p.i, strongly demonstrating an active infection. Unexpectedly, the pileup of reads seems consistently skewed over similar regions of the genome. We could speculate that the temporal gene expression regulation of THEV is different from MAdVs or this could simply mean that the infection was not well synchronized. However, the relative proportions over these similar regions shows some variation over time.
Figure
3. One-step growth of THEV (VAS vaccine strain) in MDTC-RP19 cell
line. After infecting cells at an MOI of 100 GCN/cell,
triplicates of harvested infected cells were quantified with an in-house
qPCR assay measuring the total copies of THEV genome. There is no
discernible increase in virus titer up 12 h.p.i, after which there is a
steady increase in virus titer is measured. The virus titer expands
exponentially beginning from 48 h.p.i, increasing by orders of magnitude
before reaching a plateau at 120 h.p.i, probably due to high cell death.
GCN: genome copy number.
Figure
4a. Full transcriptome of THEV. THEV
transcripts assembled from all time points by
StringTie are
unified forming this final transcriptome (splicing map). Transcripts
belonging to the same transcription unit (TU) are located in close
proximity on the genome and are color coded and labeled in this figure
as such. The organization of TUs in the THEV genome is unsurprisingly
similar to MAdVs; however, the MAdV genome shows significantly more
transcripts. The TUs are color coded: E1 transcripts - red, E2 - black,
E3 - dark grey, E4 - green, MLP - blue. Predicted ORFs are also
indicated here, colored light grey.
Figure 4b. THEV transcripts identified at given time
points. Transcripts are color coded as explained in
Fig.4a.